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Central limit theorem related to MDR-method 

Alexander Bulinsk 



In many medical and biological investigations, including genetics, it is typical to handle 
high dimensional data which can be viewed as a set of values of some factors and a binary 
response variable. For instance, the response variable can describe the state of a patient 
health and one often assumes that it depends only on some part of factors. An important 
problem is to determine collections of significant factors. In this regard we turn to the MDR- 
method introduced by M.Ritchie and coauthors. Our recent paper provided the necessary 
and sufficient conditions for strong consistency of estimates of the prediction error employ- 
ing the i^-fold cross-validation and an arbitrary penalty function. Here we introduce the 
regularized versions of the mentioned estimates and prove for them the multidimensional 
CLT. Statistical variants of the CLT involving self-normalization are discussed as well. 



Keywords and phrases: binary response variable, significant factors, penalty function, 
cross-validation, MDR-method, SLLN for arrays, strong consistency, regularized estimates, 
multidimensional CLT, self-normalization. 

AMS classification: 60F05; 60F15; 62P10. 



> 

1 Introduction 

High dimensional data arise naturally in a number of experiments. Very often such data 
are viewed as the values of some factors X\, . . . , X n and the corresponding response variable 
Y. For example, in medical studies such response variable Y can describe the health state 
(e.g., Y = 1 or Y = — 1 mean "sick" or "healthy") and X\, . . . ,X m and A m+1 , . . . ,X n are 
genetic and non-genetic factors, respectively. Usually X^ (1 < % < m) characterizes a single 
nucleotide polymorphism (SNP), i.e. a certain change of nucleotide bases adenine, cytosine, 
\ thymine and guanine (these genetic notions can be found, e.g., in [2]) in a specified segment 
of DNA molecule. In this case one considers X{ with three values, for instance, 0, 1 and 2 
(see, e.g., [1]). It is convenient to suppose that other X{ (m + 1 < i < n) take values in 
{0, 1, 2} as well. For example, the range of blood pressure can be partitioned into zones of 
low, normal and high values. However, further we will suppose that all factors take values 
in arbitrary finite set. The binary response variable can also appear in pharmacological 
experiments where Y = 1 means that the medicament is efficient and Y = — 1 otherwise. 

A challenging problem is to find the genetic and non-genetic (or environmental) factors 
which could increase the risk of complex diseases such as diabetes, myocardial infarction and 
others. Now the most part of specialists share the paradigm that in contrast to simple disease 
(such as sickle anemia) certain combinations of the "damages" of the DNA molecule could 
be responsible for provoking the complex disease whereas the single mutations need not have 



1 Faculty of Mathematics and Mechanics, Lomonosov Moscow State University, Moscow 119991, Russia. 
E-mail: bulinski@mech.math.msu.su 

2 The work is partially supported by RFBR grant 13-01-00612. 



1 



dangerous effects (see, e.g., [T5] ). The important research domain called the genome-wide 
association studies (GWAS) inspires development of new methods for handling large massives 
of biostatistical data. Here we will continue our treatment of the multifactor dimensionality 
reduction (MDR) method introduced by M.Ritchie et al. [13J. The idea of this method goes 
back to the Michalski algorithm. A comprehensive survey concerning the MDR method is 
provided in jH], on subsequent modifications and applications see, e.g., [5], [7] [T2], pZ] 
and [18J. Other complementary methods applied in GWAS are discussed, e.g., in [I], there 
one can find further references. 

In [3] the basis for application of the MDR-method was proposed when one uses an ar- 
bitrary penalty function to describe the prediction error of the binary response variable by 
means of a function in factors. The goal of the present paper is to establish the new mul- 
tidimensional central limit theorem (CLT) for statistics which permit to justify the optimal 
choice of a subcollection of the explanatory variables. 

2 Auxiliary results 

Let X = (Xi, . . . , X n ) be a random vector with components X^ : Q — >■ {0, 1, . . . , q) where 
i — 1, . . . , n (q, n are positive integers). Thus, X takes values in X = {0, 1, ... , q} n . Introduce 
a random (response) variable Y : Q — > {—1,1}, non-random function / : X — > { — 1,1} and 
a penalty function if) : {—1,1} — > M + (the trivial case if) = is excluded). The quality of 
approximation of Y by f{X) is defined as follows 

Err(f):=E\Y-f(X)\if)(Y). (1) 

Set M = {x G X : P(X = x) > 0} and 

F(x) = if)(-l)P(Y = -1\X = x)- ip(l)P(Y = l\X = x), xeM. 

It is not difficult to show (see [3]) that the collection of optimal functions, i.e. all functions 
/ : X — > { — 1, 1} which are solutions of the problem Err(f) — > inf, has the form 

f = I{A}-I{A}, A e A, (2) 

I{A} stands for an indicator of A (J{0} '■= 0) and A consists of sets 

A = {x E M : F(x) < 0} U B U C. 

Here B is an arbitrary subset of {x G M : F(x) = 0} and C is any subset of M := X \ M. If 
we take A* = {x G M : F(x) < 0}, then A* has the minimal cardinality among all subsets 
of A. In view of the relation if>(— 1) + ^(1) ^ we have 

A* = {xeM :P(Y = 1\X = x)>j(if))}, 7 (^) := if)(-l) / (if)(-l) + if)(l)). (3) 

If ip(l) = then A* = 0. If ^(1) ^ and if)(-l)/ip(l) = a where a G K + then A* = 
{x G M : P(Y = 1\X = x) > a/(l + a)}. Note that we can rewrite ([T]) as follows 

Err(f) = 2 ^ ip(y)P(Y — y, f(X) ^ y). 

2/6{-l,l} 
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The value Err(f) is unknown as we do not know the law of a random vector (X, Y). Thus, 
statistical inference on the quality of approximation of Y by means of f(X) is based on the 
estimate of Err(f). 

Let £ , £ 2 , . . . be i.i.d. random vectors with the same law as a vector (X, Y). For N G N 
set £at = ■ ■ ■ ) £ N }- To approximate Err(f), as N — > oo, we will use a prediction 
algorithm. It involves a function fpA = fpA^x, £n) with values { — 1,1} which is defined 
for x G X and £n. In fact we use a family of functions fpA(x,v m ) defined for x G X and 
v m G V m where V m := (X x { — 1, 1})"\ rn G N, m < N . To simplify the notation we write 
Jpa(x, v m ) instead of /p^(x, t> m ). For S C {1, . . . , N} (" C" means non-strict inclusion " C" ) 
put £ N (S) = e 5} and S 7 := {1, . . . , N} \ S. For K G N (K > 1) introduce a partition 

of {1, ... , iV} formed by subsets 

S k (N) = {(k- 1)[N/K] + l,...,k[N/K]I{k < K} + NI{k = K}}, k = l,...,K, 

here [b] is the integer part of a number iel. Generalizing [1] we can construct an estimate 
of Err(f) using a sample a prediction algorithm with Jpa and i^-fold cross-validation 
where K G N, K > 1 (on cross-validation see, e.g., []]). Namely, let 



W /pa ,6v):=2^ ^ ^ . (4) 

ye{-l,l} k=ljeS k (N) kK 1 

For each k = 1, . . . , K, random variables if)(y, Sk{N)) denote strongly consistent estimates 
(as N — > oo) of ip(y), y G { — 1, 1}, constructed from data {Y^ , j G Sk(N)}, and §S stands 
for a finite set S cardinality. We call ErrxifpA^N) an estimated prediction error. 
The following theorem giving a criterion of validity of the relation 

Err K (f PA ,i N ) ->- Err(f) a.s., N -> oo, (5) 

was established in [3] (further on a sum over empty set is equal to as usual). 

Theorem 1 Let J'pa define a prediction algorithm for a function f : X — > { — 1, 1}. Assume 
that there exists such set [/cX that for each x G U and any k = 1, . . . , K one has 



f PA (x^ N (S k (N))) ^ f(x) a.s., N^oo. (6) 
Then ([!]) is valid if and only if, for N — > oo ; 

K 

^(^I{/ PA (x,^(MA0)) = -l}L(x)- ^I{/p A (x,^(^(A0)) = l}L(x)) -> a.s. (7) 

k=l x&X+ x£%- 

Here X+ := (X \ U) n {x G M : f(x) = 1}, X~ := (X \ U) n {x G M : f(x) = -1} and 

l(x) = ip(i)P(x = x, y = i) - -(K-i)P(x = x, y = -1), x G X. 

The sense of this result is the following. It shows that one has to demand condition ([7]) 
outside the set U (i.e. outside the set where fpA provides the a.s. approximation of /) to 
obtain (El). 
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Corollary 1 ([3]) Let, for a function /: X — > {— 1, 1}, a prediction algorithm be defined by 
fpA- Suppose that there exists a set U C X such that for each x G U and any k = 1, . . . , K 
relation (jfJJ) is true. If 

L{x) = for x G (X \ U) n M 

i/jen ([5]) zs satisfied. 

Note also that Remark 4 from [3J explains why the choice of a penalty function proposed 
by Velez et al. pi]: 

iP(y) = c(P(Y = y))-\ ye {-1,1}, c> 0, (8) 
is natural. Further discussion and examples can be found in [3j. 

3 Main results and proofs 

In many situations it is reasonable to suppose that the response variable Y depends only 
on subcollection X kl , . . . ,X kr of the explanatory variables, {k 1: . . . , k r } being a subset of 
{1, . . . , n}. It means that for any x G M 

P(Y = l\X 1 = x 1 ,...,X n = x n ) = P(Y= l\X kl =x kl ,...,X kr = x kr ). (9) 

In the framework of the complex disease analysis it is natural to assume that only part of 
the risk factors could provoke this disease and the impact of others can be neglected. Any 
collection {ki, . . . , k r } implying (jHJ) is called significant. Evidently if {ki, . . . , k r } is signifi- 
cant then any collection {mi, . . . , mj} such that {ki, . . . , k r } C {mi, . . . , m,} is significant 
as well. For a set D C X let n klt .„ tkr D := {u = (x kl , . . . ,x kr ) : x = (xi, . . . ,x n ) G D}. For 
B G X r where X r := {0, 1, . . . , q} r define in X = X„ a cylinder 

C kl ,...,k r (B) := {x = (xt, . . . , x n ) G X : (x kl , . . .,x kr ) G B}. 

For B = {u} where u = (u±, . . . ,u r ) G X r we write C klt ,„ tkr (u) instead of C klt ,„ tkr ({u}). 
Obviously 

P(Y = l|X fcl = x kl} ...,X kr = x kr ) = P(Y = 1\X G C kl _ kr (u)), 

here 

u = 7r fcli ... jfcr {x}, i.e. Ui = x ki , i = l,...,r. (10) 
For C C X, JV G N and C {1, . . . , N} set 

p Wjv (y = i|x g c) := nW^Tn • ( n ) 

When C = X we write simply P^fY = 1) in ( II ip . According to the strong law of large 
numbers for arrays (SLLNA), see, e.g., [16], for any C C X with P(X G C) > 

P Wn (Y = 1\X G C) -)■ P(y = 1|X G C) a.s., JJWjv -> oo, X -> oo. 
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If (JHJ) is valid then the optimal function /* defined by ([2]) with A = A* introduced in fl3j) 
has the form 



/fa ;, (r) f 1, if P(Y = l\X E C fcl ,... lfc ») > 7(V0 and x G M, (i2) 

1—1, otherwise, 

here u and a; satisfy (jTUJ) (P(AT G Cfei,...,^^)) > P(-X" = i) > as i G M). Hence, for each 
significant {ki, . . . , k r } C {1, . . . , n} and any {mi, . . . , m r } C {1, . . . , n} one has 

Err(f kl '-' kr ) < Err(f mi '-' mr ). (13) 

For arbitrary {mi, . . . , m r } C {1, . . . , n}, x G X, u = r Km\,...,m r { x } an d a penalty function 
t/> we consider the prediction algorithm with a function fp^'"'™* such that 

,,,, ,„,, ( . 1, P\V N ( Y = l \ X e C m u ...,m r (u)) > JW N W, XEM, 

J PA { x ,Sn{Wn)) = S , U4J 

-1, otherwise, 

here 7vk jv (' ? /') is a strongly consistent estimate of ^{jp) constructed by means of £n(Wn). 
Introduce 

U:= {x E M : P(Y = l\X mi = x mi , . . . , X mr = x mr ) ^ l(ip)}- (15) 

Using Corollary 1 (and in view of Examples 1 and 2 of |3j) we conclude that for any 
{mi, . . .,m r } C {1, .. .,n} 

Err K (f™\'-' m \Z N ) -» Err(n-' mr ) a.s, N -> oo. (16) 

For each e > 0, any significant collection {ki,...,k r } C {1, ...,n} and arbitrary set 
{mi, . . . , m r } C {1, . . . , n} due to relations ([TBI and (flEj) one has 

^TjcCfe"'*'",^) < ^(/^-^^iv) +^ a.s. (17) 

when is large enough. 

Thus, for a given r = 1, . . . ,n — 1, according to (1171) we come to the following conclusion. 
It is natural to choose among factors X\ , . . . , X n a collection X^ , . . . , X^ r leading to the 
smallest estimated prediction error Errxifp 1 ^" r , Cat)- After that it is desirable to apply the 
permutation tests (see, e.g., jl] and [6]) for validation of the prediction power of selected 
factors. We do not tackle here the choice of r, some recommendations can be found in 
|14j . Note also in passing that a nontrivial problem is to estimate the importance of various 
collections of factors, see, e.g., [T5] . 

Remark 1. It is essential that for each {m lr ..,m r } C {l,...,n} we have strongly 
consistent estimates of Err(f mi ''"' m " r ). So to compare these estimates we can use the sub- 
set of Q having probability one. If we had only the convergence in probability instead of 
a.s. convergence in f fl6|) then to compare different ErrK{fp'\'"' ,mr ,£,N) one should take into 
account the Bonferroni corrections for all subsets {mi, . . . , m r } of {1, . . . , n}. 

Further on we consider a function if) having the form (jSJ). In view of ([3]) w.l.g. we can 
assume that c = 1 in ([8]). In this case ■y(ip) = P(Y = 1). Introduce events 

A N , k (y) = {Y> = -y, j E S k (N)}, NeN, k = l,...,K, y G {—1, 1}, 
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and random variables 



r I{A N , k (y)} 



P SkiN )(Y = y) 



where we write ipN,k(y) instead of ip(y, Sk(N)). Trivial cases P(Y = y) G {0, 1} are excluded 
and we formally set 0/0 := 0. Then 

P Sk{N) (Y = y)P(Y = y) ?{Y = y) 

Clearly, 

l{A N>k (y)} a.s., AT ^ oo, (19) 
and the following relation is true 



HAvM) ^ 1 jy — y qq_ (20) 



Therefore, by virtue of (TTHT) - (12"01 we have that for y G {—1, 1} and k = 1, . . . , K 

•$N,k{y) - ?P(y) -> a.s., JV -> oo. (21) 

Let {mi,...,m r } C {l,...,n}. We define the functions which can be viewed as the 
regularized versions of the estimates f™^>---> mr Q f jm 1 ,...,m r ^ gee anc [ (TT2])). Namely, for 
Wjv C {1, . . . , AT}, iV G N, and £ = (£7v)Ar e N where non-random positive En — > 0, as iV — > oo, 
put 



1, P Wjv (^ = l|X G C mij ... inv (u)) > 7 WN (ip)+e N , x G M, 
-1, otherwise, 



where w = ^ mi ,...,m r {x}. Regularization of j^-' mr means that instead of the threshold 
7w N (4>) we use jw N W + 

Take now U appearing in ( |15p . Applying Corollary 1 once again (and in view of Examples 
1 and 2 of [3]) we can claim that the statements which are analogous to ffT6j) and (fTTl) are 
valid for the regularized versions of the estimates introduced above. Now we turn to the 
principle results, namely, central limit theorems. 

Theorem 2 Let En — > and N 1 I 2 en — > oo as N — > oo. Then, for each K G N, any subset 
{mi, ■ ■ ■ ^r} of {1, . . . , n}, the corresponding function f = J m i>--->«v and prediction algorithm 
defined by fpA = fpX''e' mr > ^ e following relation holds: 

VN(Err K (f PA ^ N )-Err(f))^Z~N(0,o- 2 ), N -> oo, (22) 
where a 2 is variance of the random variable 

v = 2 E p%Z y \ WW *y}- p(/P0 * y\Y = y)) ■ (23) 

ve{-i,i} 1 yj 
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Proof. For a fixed K G N and any iV G N set 

K 



T ^f)--=^T,jslN) E Mv) E i{^ = y./(^")^y}, 

fc=l W ^ y 6 {-l,l} jSSfe(iV) 



K 



TM--=wt,TslN) E E HYi = y,f(Xi)^y}. 

k=i w fc ^ ^ ye{-i,i} j&S k (N) 

One has 

Err K {f PA: t N ) - Err(f) = (Err K (f PA , £ N ) - f N (f)) 

+ (f N (f) - T N (f)) + (T N (f) - Err(f)). (24) 

First of all we show that 

y/N(Err K {fp A ,^f)-f N (f))-^0, N -> oo. (25) 
For a; G X, y G {— 1, 1}, k = 1, . . . , if and iV G N introduce 



i^foj/) := I{f PA (x,Z N (S k (N))) ^y}- I{f(x) ? y}. 

Then 

~ 2 K I 

Err K (f PA ^ N )-T N (f) = -J2jsW) E E = y} F N,k(X j ,y). (26) 

fc =i B fe V J y e {-i,i} ies fc (JV) 

We define the random variables 

B ^(y)--=^=s E n{y i = y}i ? ^ i ,y) 
v Pk{N) jeSk{N) 

and verify that for each k = 1, . . . , K 

^N,k(y)BN,k(y)-^n, n-^oo. (27) 

ye{-i,i} 

Clearly (ETJ) implies ([25]) in view of ((26]) as ftSfc(iV) = [N/K] for jfe = 1, . . . , K - 1 and 
[iV/K] < ft^(iV) < [iV/if] + AT. Write B N>k (y) = B$ k (y) + B%{y) where 

V^fc(^) ie 5 fe (iV) 

B%{y) = ^=L= Y,I{Xi tU}I{Y* = y}F N ^X\y). 

Obviously 
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Functions fpA and / take values in the set {—1,1}. Thus, for any x G U (where U is 
defined in (1151) 1. k = 1,...,K and almost all u G Q relation (El) ensures the existence of 



an integer N (x,k,ui) such that /pa(x, ^(^(X))) = /(x) for iV > X (x, k,u). Hence 
^Nkiv) = ® ^ or an y V belonging to { — 1,1}, each k = 1, . . . , K and almost all u G Q when 
X > A^ ,fc(w) = max x( zu N (x, k,ui). Evidently, N ^ < oo a.s., because §U < oo. We obtain 
that 

hM^M a - s - N -+ °°- ( 28 ) 

ye{-i,i} 

If [/ = X then B^\(y) = for all X, k and y under consideration. Consequently, (127]) is 
valid and thus, for U = X, relation ( 125]) holds. Let now {7 ^ X. Then for k = 1, . . . , K and 
TV G N one has 

^ $ N M B N,k(y) = ^2 H NA x ,y)+Yl H Njk (x,y), 

y£{-l,l} xeX+ ye{-l,l} xeX- ye{-l,l} 

here X+ = (X \ U) n {x G X : /(x) = 1}, X_ = (X \ U) fl {x G X : /(x) = -1} and 

#;v,*(x, i/) := 4== £ l i Aj & y)}(HfpA(x, UsW)))^y} - Hm^y}) 

where A^(x, y) = {X- 7 = x, = y}. The definition of U yields that X + = and 
X_ = M U {x G M : P(Y = l|X mi = x mi , . . . ,X m „ = x m J = 7 (^)}. 

Set 

% ifc (x) = I{X^' = z}(fo ifc (l)I{Y' = 1} - $ Njk {-l)I{Yi = -1}). 
It is easily seen that 



xex_ j/e{— 1,1} ^ex_ jes k (N) 



Note that R j Nk {x) = a.s. for all x G M, A; = 1, . . . , K, j = 1, . . . , 7Y and X G N. Let us 
prove that, for any x G M D X_ and = 1, . . . , K, 



Hfp A (x^ N (S k (N))) = 1} -1+ 0, X^oc. (29) 
For any z/ > and x G M fl X_ we have 



P(I{Wx,6v(S fc (X))) = l}>^) 

= ^ Sfc(iv)(^ = MX mi = x mi , . . . ,X mr = x mr ) > 1 Sk ( N )(ip) + £n) ■ 

Now we show that, for k = 1, . . . , K, this probability tends to as X — > oo. For Wn C {1, . . . , X} 
and x G M n X_, put 

^jGWiv ^ 



where if = I{Yi = = x mi , . . . ,X J mr = x mr }, ( j = I{X^ = x mi , . . .,X J mr = x mr } } 

j = 1, . . . , N. Set p = P(X mi = x mi , . . . , X mr = x mr ). It follows that, for any > 0, 



< P 



A N (W N ,x) 



a N , 



jew N 



+ p (lsk.S CI '-H^~) +p (lpk^ I{y<=1> - p(y=1) l SQN )- (30) 



Due to the Hoeffding inequality 



j&W N 



P(|— ^— ^ C J -p| >«jv) < 2ex P {-2%W N a 2 N } =: 5 N (W N ,a N ). 
jew N 

We have an analogous estimate for the last summand in (13"Uj) . Consequently, taking into 
account that p > we see that for all N large enough 

A N (W N ,x) < P(-^— V rf > (p-a N )(j(i;) -a N + e N )) + 2S N (W N ,a N ). 
Whenever x G M fl X_ one has 



p(y i, X mi x mi , . . . , x. 

therefore 

^ — Er^ 



m r X m 



P(Y = 1)P(X 



mi 



Xmi 



X rt >,. Xr 



, . . . , 7i mr ^rrirji 



jdW N 



N 



> VWn(p£n - a N (>y(ip) +p-a N + e N )) ) + 25 N (W N ,a N 



The CLT holds for an array {if,j G Wn,N G N} consisting of i.i.d. random variables, 



thus 



—L= Y (V J ~ Er/ J ) ^ Z ~ iV(0, 



(7, 



here = varI{Y = l,X mi = x mi , . . . ,X mr = x mr }. Hence An(Wn,x) — > if, for some 

> 0, 

ocn\/§Wn oo, £at\/P^v - ► oo, a^/sN — > as iV — >■ oo. (31) 



Take W^v = S fe (iV) with fc = 1, . . . , K. Then $S k (N) > (K - l)[N/K] for k = 1, . . . , K and 
we conclude that (13~TT) is satisfied when enN 1 / 2 — > oo as N — > oo if we choose a sequence 
(otjv) jven in appropriate way So, relation ( 1291) is established. 
Let 

=I{X J ' = x}(^(l)I{y J ' = 1} = -1}), xGX, j G N. 
For all a; G M fl X_ one has 



J jcs k (N) 



+ V UX> = } ^' nM - WW zilz fet" 1 ) - tf'(- 1 ))»{ yj - -1} 

Note that Ei2j(x) = for all j G N and x G X_. The CLT for an array of i.i.d. random 
variables {Rj(x),j G Sk(N),N G N} provides that 

-=L= J2 R M N{0, *?(*)), iV ^ oo, 

where of(x) = var(J{X = x}(ip(l)I{Y = 1} - ip(-l)l{Y = -1})), x G X_. For each 

ye {-1,1}, 

$ Njk (y) - V>(y)) E = x}I{W = 

= fe(y) - ^(y))_i_ £ = x}I{W = y} - EI{X' = ar}I{^ = y}) 

+$ N , h {y) - il>{y)W$S k (N)P{X = x,Y = y). 

Due to the CLT 

\- l{Xi = x}l{Yi = y}- EI{Xi = x}I{Y^ = y} ^ 

as N — y oo, where o\{x, y) = varI{X^ = x, Y^ = y}. In view of (12 ip we have 

E ^ = = f > " = = !/})A0 

as X — > oo. Now we apply (ITS1) - f l20|) once again to conclude that 

(fefc(y) - i/>(y)W$Sk(N) ^Z 3 ~ N(0, a 2 3 (y)), N -+ oo, 
with <rf (y) = P(Y = -y)(P(y = y))" 3 . Thus, 

J2 $NMB%{y) A 0, iV^oo. (32) 

!/€{-l,l} 

Taking into account (j2"5|) and (1521) we come to (I2"7j) and consequently to (|2"5]) . 
Now we turn to the study of Tjv(/) — Tjf(f) appearing in (1241) . One has 



N(T N (f) — T N (f)) 

= ^E?c7M E fc(y)-^(y)) E ifl* = v> /(*') ^ v}- 

fc=i B fel j ye{-l,i) jeS k (N) 
Put # = I{Yi = y, f(X j ) y}, j = 1, . . . , N. For each k = l,...,K 

E {^M-m) rrL^ E = V> H*') * V) = 

ye{-i,i] V^k^) jeSk{N) 
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ye{-i,i} V^k(N) jeSk(N) 

+ y^S k jN) $N, k (y)-i>(y))P(y = y,f(X)^y). 

s/e{-i,i} 

Due to (|2IJ and CLT for an array of j G Sfc(iV), iV G N} we have 



i/e{-i,i} v»"*v j v jeSi (jv) 
as iV — >■ oo. Consequently the limit distribution of 

VN[(f N (f) - T N (f)) + (T N (f) - Err(f))} 
will be the same as for random variables 

2 K 

VN[(T N (f) - Err(f)) + £ (^, fc (y) - ^))P(F = y, /(X) ^ y)]. (33) 

fc=1 s/e{-i,i} 

Note that for each y G {—1, 1} and k — 1, . . . , K 

P Sk{N) (Y = y)-P(Y = y)^0, 



^iN)(P SkiN) (Y = y)- P(Y = y)) ^ Z, ~ iV(0 )( x 4 2 ), 

as N oo, where cr| = P(Y = -1)P(T = 1). 

Now the Slutsky lemma shows that the limit behavior of the random variables introduced 
in (1331) will be the same as for random variables 



N(T N (f) ~ Err(f)) 



fc=i y e{-i,i} v yj 

2VN^ v 1 V ( I{Y* = y, f{Xi) ^y}- P(Y = y, f(X) £ y) 
K 2^ 2-^ %S k (N) ^ \ P(Y = y) 

k=l ye{-l,l} H fcV 1 j€S k {N) V yy 

l{ Y i = y }- P(Y = y)P(Y = y, f(X) ^ y) 



p(y = y y 



k=l W feV > j&S k (N) 

where 

v* = F 2I{FJ = y} f:{/i.YM /y/} . PiT «•/">■•'//) V 
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For each k = 1,...,K, the CLT for an array {V^,j G Sk{N),N G N} of i.i.d. random 
variables yields the relation 

Z Nk :=^=L= V (V j — EV j ) ^> Z ~ N(0, a 2 ), N -> oo, 

where a 2 = far V and was introduced in ( |23l) . Since Zat,i, • • • , ^jv,k are independent and 
a//V / \/§Sk(N) — y \[K for h = 1, . . . , K, as N — > oo, we come to fl22|) . The proof is complete. 
□ 

Recall that for a sequence of random variables (r] N ) N€N and a sequence of positive num- 

p 

bers (ojv) jvgn orLe writes //at = op(aiv) if 7]n/o-n — ► 0, iV — > oo. 

Remark 2. As usual one can view the CLT as a result describing the exact rate of 
approximation for random variables under consideration. Theorem 2 implies that 

Err K {f PA ^ N )-Err{f) = o P {a N ), N -> oo, (34) 

where ajy = o(N~ l l 2 ). The last relation is optimal in a sense whenever a 2 > 0, i.e. one 
cannot take a N = 0(A^~ 1/2 ) in f[34j) . 

Remark 3. In view of ffTTl) it is not difficult to construct the consistent estimates of 
unknown a appearing in (1221) . Therefore (if a 2 ^ 0) we can claim that under conditions of 
Theorem 1 



N {Err K {f PA ^ N ) - Err(f)) ^ - ~ iV(0, 1), iV ^ oo. 
(Tat a 

Now we consider the multidimensional version of Theorem 2. To simplify notation set 
a = (mi,...,m r ). We write f PAe and / Q instead of fp A> ' £ '' mr and / mi '' ,, ' mr , respectively. 
Employing the Cramer- Wold device and the proof of Theorem 2 we come to the following 
statement (as usual we use the column vectors and write T for transposition). 

Theorem 3 Let — » and N 1 I 2 en — >■ oo as iV — > oo. Then, for each K G N, any 
= {m^ , . . . , ml''} C {1, . . . , n} where i = 1, . . . , s, one /ias 

v / iV(4 1) ,---,4 ) ) Ti ^Z~iV(0,C'), iV^oo. 

iJere Zj^ = Errx(fp A ) e ,^,N)~Err(f a ^), i — 1, . . . , s, and the elements of covariance matrix 
C = (qj) have the form 

c id = cov(V(a(i)), V(a(j))), i,j = l,...,s, 

the random variables V(a(i)) being defined in the same way as V in (F2"3"|) with J m i>---> m '- 
replaced by / a W . 

To conclude we note (see also Remark 3) that one can construct the consistent estimates 
of the unknown (nondegenerate) covariance matrix C to obtain the statistical version of 
the last theorem. Namely, under conditions of Theorem 3 the following relation is valid 

(d N yV 2 (Z$, . . . , 4 S) ) T n C-^ 2 Z ~ N{0, 1), N -+ oo, 
where / stands for the unit matrix of order s. 
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